Phonetic Level Annotation and Segmentation of Hungarian Speech Databases

نویسندگان

  • Gyula Zsigri
  • András Kocsor
  • László Tóth
  • Györgyi Sejtes
چکیده

The purpose of this paper is to give an outline of phonetic level annotation and segmentation of Hungarian speech databases at the levels of definition and speech technology. In addition to giving guidance to the definition of the content of a database, the technique of annotation and the procedure of manual segmentation, we also discuss mathematical models of computeraided semi-automatic and automatic segmentation. Finally, we are summing up our observations about the application of the procedures we gained during the processing of the MTBA Hungarian Telephone Speech Database. 1 Designing a Speech Database Statistics based speech processing, particularly automatic speech recognition, requires well-organized, large speech databases. Training a speech recognition program is based on statistical parameter estimation. Accurate parameter tuning requires training on a large number of samples. A proper training database is made up of collections of such samples accompanied with the necessary notes, labels and transcriptions. The databases should include the observations that are required by the parameter estimation and all the samples that cover the variability of speech (and noises of the environment). A speech database is a large set of sound data which can be organized by several grouping conditions. The size and internal structure of a database is usually determined by the area of use. To achieve a reliable level of accurate recognition, the material should contain every typical variation that is likely to occur during recognition. ∗Presented at the 1st Conference on Hungarian Computational Linguistics, December 10–11, 2003, Szeged. †Department of Hungarian Linguistics, University of Szeged, H-6722 Szeged, Egyetem utca 2., Hungary, e-mail: [email protected] ‡Research Group on Artificial Intelligence of the Hungarian Academy of Sciences and University of Szeged, H-6720 Szeged, Aradi vértanúk tere 1., Hungary, e-mail: [email protected], [email protected] §Department of Hungarian Linguistics, University of Szeged, H-6722 Szeged, Egyetem utca 2., Hungary, e-mail: [email protected]

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

مراحل و نحوه ی تهیه ی دادگان های صوتی هجایی و دایفونی برای سامانه ی تبدیل متن به گفتار فارسی

Abstract Speech databases are part of the concatenative text to speech synthesis systems. Phonetic quality of the databases plays a significant role in the naturalness of the synthesized speech. This paper introduces two syllable and diphone speech databases for Persian and investigates the way of their development and their specifications and their advantages to each other. ...

متن کامل

Phonetic segmentation of singing voice using MIDI and parallel speech

When analyzing singing voice signal, it is required to know the boundaries of each phonetic unit in the singing voice samples. However, due to prolonged vowels in the singing voice, it is not easy to accurately align a singing voice with the phonetic sequence of its lyrics by conventional speech recognition approach. This paper proposes a solution for the phonetic annotation of the singing voic...

متن کامل

Orthographic and Phonetic Annotation of Very Large Czech Corpora with Quality Assessment

The annotation is generally indivisible part of speech database. In this paper we are presenting common orthographic and phonetic annotation of large Czech databases. Phonetic annotation may be very important and gives more information than pronunciation lexicon with possible pronunciation variants. Moreover, for Czech language phonetic annotation means just small additional effort to standard ...

متن کامل

Automatic phonetic transcription of spontaneous speech (american English)

An automatic transcription system has been developed to label and segment phonetic constituents of spontaneous American English without benefit of a word-level transcript. Instead, special-purpose neural networks classify each 10-ms frame of speech in terms of articulatory-acoustic-based phonetic features and the feature clusters are subsequently mapped to phonetic-segment labels using multilay...

متن کامل

Speech Variation and the Use of Distance Metrics on the Articulatory Feature Space

This paper describes ongoing research on the relation between variation in speech in the articulatory-acoustic domain and the variation as represented in the symbolic domain. More specifically, we address variation in speech as represented by articulatory features, and the description of variation in phone annotation and segmentation. Variation in speech is quantified by using distance metrics ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Acta Cybern.

دوره 16  شماره 

صفحات  -

تاریخ انتشار 2004